A Linear Size Index for Approximate Pattern Matching

نویسندگان

  • Ho-Leung Chan
  • Tak Wah Lam
  • Wing-Kin Sung
  • Siu-Lung Tam
  • Swee-Seong Wong
چکیده

This paper revisits the problem of indexing a text S[1..n] to support searching substrings in S that match a given pattern P [1..m] with at most k errors. A naive solution either has a worst-case matching time complexity of Ω(m) or requires Ω(n) space. Devising a solution with better performance has been a challenge until Cole et al. [5] showed an O(n log n)-space index that can support k-error matching in O(m+occ+log n log logn) time, where occ is the number of occurrences. Motivated by the indexing of DNA, we investigate in this paper the feasibility of devising a linear-size index that still has a time complexity linear in m. In particular, we give an O(n)-space index that supports k-error matching in O(m+ occ+ (logn) log logn) worst-case time. Furthermore, the index can be compressed from O(n) words into O(n) bits with a slight increase in the time complexity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parameterized matching on non-linear structures

The classical pattern matching paradigm is that of seeking occurrences of one string in another, where both strings are drawn from an alphabet set Σ. In the parameterized pattern matching model, a consistent renaming of symbols from Σ is allowed in a match. The parameterized matching paradigm has proven useful in problems in software engineering, computer vision, and other applications. In clas...

متن کامل

Indexes for Jumbled Pattern Matching in Strings, Trees and Graphs

We consider how to index strings, trees and graphs for jumbled pattern matching when we are asked to return a match if one exists. For example, we show how, given a tree containing two colours, we can build a quadratic-space index with which we can find a match in time proportional to the size of the match. We also show how we need only linear space if we are content with approximate matches.

متن کامل

An Index for Two Dimensional String Matching Allowing Rotations

We present an index to search a two-dimensional pattern of size m × m in a two-dimensional text of size n × n, even when the pattern appears rotated in the text. The index is based on (path compressed) tries. By using O(n) (i.e. linear) space the index can search the pattern in O((logσ n) ) time on average, where σ is the alphabet size. We also consider various schemes for approximate matching,...

متن کامل

FAMOUS: Fast Approximate string Matching using OptimUm search Schemes

Finding approximate occurrences of a pattern in a text using a full-text index is a central problem in bioinformatics and has been extensively researched. The introduction of practical bidirectional indices has opened new possibilities for solving the problem as they allow the search to be started from anywhere within the pattern and extended in both directions. In particular, use of search sch...

متن کامل

A Hybrid Indexing Method for Approximate String Matching

We present a new indexing method for the approximate string matching problem. The method is based on a suffix array combined with a partitioning of the pattern. We analyze the resulting algorithm and show that the average retrieval time is , for some that depends on the error fraction tolerated and the alphabet size . It is shown that for approximately , where . The space required is four times...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006